The data set includes observations of electrical energy consumption registred for every day of the month during 7 years (2010-2017).
it includes:
Date=paste(D$annee,D$mois,D$jour,sep = '-')
D$Date=as.Date(Date)
Variab<- ts(D$Energie_trans,start = c(2010, as.numeric(format(D$Date[1], "%j"))),
frequency = 365)
don <- xts(x =Variab, order.by = D$Date)
dygraph(don) %>%
dyOptions(labelsUTC = TRUE, fillGraph=TRUE, fillAlpha=0.1, drawGrid = FALSE, colors="#D8AE5A") %>%
dyRangeSelector() %>%
dyCrosshair(direction = "vertical") %>%
dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = FALSE) %>%
dyRoller(rollPeriod = 1)It can be seen that the consumption of electrical energy has an ascending trend.
From the plots above we can recongnize an ascending trend and a stationary remainder (noise).
As the time series increases in magnitude, the seasonal variation increases as well. It’s why we should use a multiplicative model.
Tmoy, tmin and Tmax are considered explicative variables, the choice of which one to include in the model is based on its correlation with the energy consumption. Statistically, we can’t include two correlated explicative variables in the same model.
ggcorrplot(cor(D[,c(8,5:7)]),
outline.col = "white",
lab = TRUE,
lab_size = 5,
lab_col = '#736F6E',
ggtheme = ggplot2::theme_gray,
colors = c('#595959', "white", "#6D9EC1"))In the following, tmin is considered the optimal variable to use among tmin, Tmax and Tmoy with a correlation with Energy equal to 0.65.
tslm function is used to explicate the consumption of electrical energy using various explicative variables:
train <- D[1:2557, ] #excluding the last year to predict energy consumption later
test <- D[2558:2922, ]
train=ts(train,start = c(2010, as.numeric(format(train$Date[1], "%j"))),frequency = 365)
fit <- tslm(Energie_trans ~annee+tmin+JF+Ramadhan+Janvier+
Fevrier+Mars+Avril+Mai+Juin+Juillet+Aout+Septembre+Octobre+Novembre+Lundi+Mardi+Mercredi+Jeudi+Vendredi+Samedi, data = train)
summary(fit)##
## Call:
## tslm(formula = Energie_trans ~ annee + tmin + JF + Ramadhan +
## Janvier + Fevrier + Mars + Avril + Mai + Juin + Juillet +
## Aout + Septembre + Octobre + Novembre + Lundi + Mardi + Mercredi +
## Jeudi + Vendredi + Samedi, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23000.4 -2241.9 -74.7 2079.2 18662.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.229e+06 8.535e+04 -49.552 < 2e-16 ***
## annee 2.128e+03 4.240e+01 50.195 < 2e-16 ***
## tmin 3.812e+02 3.157e+01 12.073 < 2e-16 ***
## JF -5.907e+03 4.391e+02 -13.451 < 2e-16 ***
## Ramadhan 6.790e+02 3.610e+02 1.881 0.0601 .
## Janvier -8.135e+02 4.133e+02 -1.968 0.0492 *
## Fevrier -8.642e+01 4.245e+02 -0.204 0.8387
## Mars -3.011e+03 4.118e+02 -7.312 3.52e-13 ***
## Avril -4.693e+03 4.248e+02 -11.046 < 2e-16 ***
## Mai -2.455e+03 4.419e+02 -5.556 3.04e-08 ***
## Juin 4.258e+03 5.000e+02 8.516 < 2e-16 ***
## Juillet 1.396e+04 5.741e+02 24.321 < 2e-16 ***
## Aout 1.437e+04 5.823e+02 24.681 < 2e-16 ***
## Septembre 5.096e+03 5.422e+02 9.400 < 2e-16 ***
## Octobre -2.153e+03 4.848e+02 -4.442 9.31e-06 ***
## Novembre -4.499e+03 4.318e+02 -10.419 < 2e-16 ***
## Lundi 6.858e+03 3.171e+02 21.627 < 2e-16 ***
## Mardi 8.192e+03 3.171e+02 25.838 < 2e-16 ***
## Mercredi 8.340e+03 3.171e+02 26.306 < 2e-16 ***
## Jeudi 8.438e+03 3.171e+02 26.614 < 2e-16 ***
## Vendredi 8.226e+03 3.168e+02 25.963 < 2e-16 ***
## Samedi 5.158e+03 3.171e+02 16.266 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4286 on 2535 degrees of freedom
## Multiple R-squared: 0.833, Adjusted R-squared: 0.8317
## F-statistic: 602.3 on 21 and 2535 DF, p-value: < 2.2e-16
## ME RMSE MAE MPE MAPE MASE
## Training set 1.615193e-13 4267.356 3046.05 -0.3339313 4.319366 0.5606067
## ACF1
## Training set 0.7842202
This model reflects 83.27% of the reality (Adjusted R-squared=0.8317).
The model previously constructed will be used to make a forecast of one year of electrical energy in Tunisia.
fore<-forecast(fit,test)
p=plot_ly() %>%
add_lines(x = (D)$Date , y = (D)$Energie_trans,
color = I('#595959') , name = "observed") %>%
add_ribbons(x = (test)$Date, ymin = fore$lower[, 2], ymax = fore$upper[, 2],
color = I("#b3b3ff"), name = "95% CI") %>%
add_lines(x = (test)$Date, y = fore$mean, color = I("#0073e6"), name = "prediction")
pThe forecasted values of consumed electrical energy is accurate as we can observe through the plot.
Electrical energy consumption will continue to have an increasing trend through time.
Cleveland, R. B., Cleveland, W. S., McRae, J. E., & Terpenning, I. J. (1990). STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1), 3–33. http://bit.ly/stl1990
Rob J Hyndman and George Athanasopoulos, Monash University, Australia. Forecasting: Principles and Practice. https://otexts.com/fpp2/
Yan Holtz.the R graph gallery. https://www.r-graph-gallery.com/index.html